Evaluation of genomic offset predictions in common gardens

Author

Juliette Archambeau

Published

May 30, 2023

In this report, we aim to evaluate the genomic offset predictions from the different GEA methods (LFMM, GF, GDM and RDA) with phenotypic data from five common gardens located in Spain (Asturias, Madrid, Cáceres), Portugal (Fundão) and France (Pierroton). For that, for each GEA method, we predicted the genomic offset of each population when transplanted in the environment of the common garden (so instead of predicting the genomic offset into future climates, we predict it in the new environment of the common garden, i.e. space-for-time approach). The rational is that populations with the highest genomic offset in each common garden are expected to have a lower fitness in the common gardens. In this report, we use tree height and mortality in five common gardens as proxies of fitness.

In comparison, we also estimate the climatic transfer distances (CTD) between the climate-of-origin of the populations and the climate of the common garden (i.e. the absolute value of the difference between the climate of origin of the populations and the climate in the common garden between the tree planting date and the measurement date) and we estimate the association between tree height and mortality and the climatic transfer distances. The climatic transfer distances are calculated for the same set of climatic variables as the one used to calculate the genomic offset (one CTD per climatic variable).

1 Data

1.1 Genomic offset predictions

We load the genomic offset predictions estimated from the different GEAs.

Code
df <- lapply(c("control","cand"), function(x){

list_snps <- list()

list_snps[[x]]$GDM <- readRDS(file=here("outputs/GDM/go_predictions.rds"))[[x]][["go_cg"]]
list_snps[[x]]$GF <- readRDS(file=here("outputs/GF/go_predictions.rds"))[[x]][["go_cg"]]
list_snps[[x]]$RDA <- readRDS(file=here("outputs/RDA/go_predictions.rds"))[[x]][["go_cg"]]
list_snps[[x]]$LFMM <- readRDS(file=here("outputs/LFMM/go_predictions_snpsets.rds"))[[x]][["go_cg"]]


list_snps <- list_snps[[x]] %>% 
  bind_rows(.id="method_type") %>% 
  mutate(method_input = x)

return(list_snps)
}) %>% bind_rows()


df <- readRDS(file=here("outputs/LFMM/go_predictions_allsnps.rds"))[["go_cg"]] %>% 
   mutate(method_type = "LFMM",
          method_input = "all") %>% 
  bind_rows(df) %>% 
  pivot_longer(cols=c(readRDS(here("data/ClimaticData/CommonGardens/ClimateCG.rds"))[["cg"]]),names_to="cg",values_to="varX") %>% 
  mutate(method = paste0(method_type, "_",method_input))

1.2 Climatic transfer distances

We calculate the climatic transfer distances.

Code
# selected climatic variables
clim_var <- readRDS(here("data/ClimaticData/NamesSelectedVariables.rds"))

# climatic data in the common gardens (btw planting and measurement dates)
cg_clim <-readRDS(here("data/ClimaticData/CommonGardens/ClimateCG.rds")) %>% 
  dplyr::select(cg,any_of(clim_var)) 

# Loading point estimate climatic data
adj <- "noADJ"  # not adjusted for elevation
ref_period <- "ref_1901_1950" # reference period 1901-1950
clim_past <- readRDS(here(paste0("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationPointEstimates_ReferencePeriods_",adj,".rds")))[[ref_period]]$ref_means %>%
  dplyr::select(pop,any_of(clim_var))


df <- lapply(cg_clim[["cg"]], function(x){
  
for(var in clim_var){
  
  clim_past[[var]] <- ( clim_past[[var]] - cg_clim %>% filter(cg == x) %>%  pull(var) ) %>% abs()
} 
  
return(clim_past)
}) %>%  
  setNames(cg_clim[["cg"]]) %>% 
  bind_rows(.id="cg") %>% 
  pivot_longer(cols=any_of(clim_var),names_to="method_input",values_to="varX") %>% 
  mutate(method_type="CTD",
         method=paste0("CTD_",method_input)) %>% 
  bind_rows(df)

1.3 Phenotypic data

We load the survival and mortality data from the five common gardens.

Code
pheno_data <- readRDS(file=here("data/CommonGardenData/PhenoDataNovember2019_AnnualTraits_UpdatedSept2021_AllSites.rds")) %>% dplyr::rename(pop=prov)

no_nas <- sapply(pheno_data, function(x) length(x)-sum(is.na(x)))

list_pheno <- list()

list_pheno$`Asturias (37 months)` <- table(pheno_data$site,pheno_data$AST_survmar14)["asturias",]
list_pheno$`Bordeaux (85 months)` <- table(pheno_data$site,pheno_data$BDX_surv18)["bordeaux",]
list_pheno$`Cáceres (8 months)` <- table(pheno_data$site,pheno_data$CAC_survdec11)["caceres",]
list_pheno$`Madrid (13 months)` <- table(pheno_data$site,pheno_data$MAD_survdec11)["madrid",]
list_pheno$`Fundão (27 months)` <- table(pheno_data$site,pheno_data$POR_survmay13)["portugal",]

list_pheno %>% 
  bind_rows(.id="cg") %>% 
  setNames(c("Common garden (tree age)","Nb of dead trees","Nb of trees alive")) %>% 
  mutate("Nb of height measurements"=c(no_nas[["AST_htmar14"]],
                                       no_nas[["BDX_htnov18"]],
                                       no_nas[["CAC_htdec11"]],
                                       no_nas[["MAD_htdec11"]],
                                       no_nas[["POR_htmay13"]])) %>% 
  kable_mydf
Common garden (tree age) Nb of dead trees Nb of trees alive Nb of height measurements
Asturias (37 months) 206 3976 3976
Bordeaux (85 months) 119 3225 3217
Cáceres (8 months) 3818 366 340
Madrid (13 months) 3138 1046 1046
Fundão (27 months) 1517 2666 2665

Height and survival measurements in each common garden:

  • Asturias (Spain) in March 2014 when the trees were 37 month-old (trees were planted in February 2011).

  • Bordeaux (France) in November 2018 when the trees were 85 month-old (trees were planted in October 2011).

  • Cáceres (Spain) in December 2011 when the trees were 8 month-old (trees were planted in April 2011). Note that for this common garden, we calculate the bioclimatic variables for the entire year 2011 (instead of calculating the variables only for months between April and December). Indeed, the calculation of the annual bioclimatic variables will be wrong if we do not account for some months, e.g. the mean annual temperature will be higher than expected because we do not account for some winter months.

  • Madrid (Spain) in December 2011 when the trees were 13 month-old (trees were planted in November 2010).

  • Fundão (Portugal) in May 2013 when the trees were 27 month-old (trees were planted in February 2011)

2 Mortality models

In this section, we want to determine whether genomic offset (GO) or climate transfer distances (CTD) are associated with the proportion of dead trees in the populations, independently in the five common gardens. We build a model that assumes that the initial sapling height acts as a confounder. Indeed, trees that were higher at the time of planting have a higher probability of survival. This is particularly true in Madrid and Cáceres (Spain) where the seedlings experienced an extreme drought event the same year they were planted, resulting in very high mortality rates. Here is the model:

\[\begin{align*} a_{p} &\sim \text{Binomial} (N_{p},p_{p}) \\ \text{logit}(p_{p}) &= \beta_{0} + \beta_{H}H_{p} + \beta_{X}X_{p}\\ \end{align*}\]

with \(a_{p}\) the count of individual that died in the population \(p\), \(N_{p}\) the total number of individuals in the population \(p\) (=number of individuals that were initially planted in the common garden), \(p_p\) is the estimated probability of mortality in the population \(p\), \(X_{p}\) is the genomic offset or climatic transfer distance for the population \(p\) and \(H_{p}\) is the BLUPs for height of the population \(p\) (population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. 2022). \(H_{p}\) is used as a proxy of the initial tree height.

We used the following weakly informative priors:

\[\begin{align*} \begin{bmatrix} \beta_{0,c} \\ \beta_{H} \\ \beta_{X} \end{bmatrix} &\sim \mathcal{N}(0,5) \end{align*}\]

2.1 The variables

2.1.1 Initial tree height (confounder)

We load the population-specific intercepts from the model 1 of Archambeau et al. (2022).

Code
mod_arch2022 <- readRDS(file=here("data/Archambeauetal2022_MOD1.rds"))
pop_heights <- mod_arch2022$fit %>% 
  broom.mixed::tidyMCMC(estimate.method = "mean",conf.int = T) %>% # we take the mean of the prov random intercepts
  filter(str_detect(term, "^(r_prov\\[)")) %>% 
  dplyr::rename(height=estimate,pop=term) %>% 
  mutate(pop=str_sub(pop,8,-12))

pop_heights[1:5,] %>% kable_mydf
pop height std.error conf.low conf.high
ALT 0.09 0.04 0.01 0.17
ARM 0.12 0.04 0.04 0.20
ARN -0.09 0.03 -0.16 -0.03
BAY -0.14 0.03 -0.20 -0.07
BON -0.04 0.04 -0.12 0.04

2.1.2 Survival data

The response variable is counts of dead trees. To calculate these counts, we load the survival data in the common gardens, in which 0 corresponds to dead trees and 1 to survivors.

Code
surv_measurements <- c("AST_survmar14","BDX_surv18","CAC_survdec11","MAD_survdec11","POR_survmay13")
survival_data <- pheno_data %>% 
  dplyr::select(site,block,pop,clon,tree,any_of(surv_measurements)) %>% 
  pivot_longer(cols=any_of(surv_measurements), names_to = NULL, values_to = "survival") %>% 
  drop_na(survival) 

survival_data[1:5,] %>% kable_mydf
site block pop clon tree survival
asturias 2 CAR CAR12 CAR12_2 1
asturias 4 STJ STJ14 STJ14_4 1
asturias 5 CUE CUE6 CUE6_5 1
asturias 6 MAD MAD1 MAD1_6 1
asturias 4 ORI ORI12 ORI12_4 1

2.2 Run the models

Stan code of the mortality model:

Code
stancode = stan_model(here("scripts/StanModels/ValidationCommonGarden_BinomialMortalityModel.stan"))
print(stancode)
S4 class stanmodel 'anon_model' coded as follows:
data {
  int N;
  int nb_tot[N];    // Total number of trees in the population
  int nb_dead[N];   // Number of dead trees in the population
  vector[N] H;      // Mean tree height of the population
  vector[N] X;      // Genomic offset or climatic transfer distance of the population
}
parameters {
  real beta_0;
  real beta_H;
  real beta_X;
}
model {
  nb_dead ~ binomial_logit(nb_tot,beta_0 + beta_H * H + beta_X * X);

  beta_0 ~ normal(0,5);//std_normal();
  beta_H ~ normal(0,5);//std_normal();
  beta_X ~ normal(0,5);//std_normal();
}
generated quantities{
  vector[N] log_lik;
  // log likelihood for loo
  for (n in 1:N) {
    log_lik[n] = binomial_logit_lpmf( nb_dead[n] | nb_tot[n] , beta_0 + beta_H * H[n] + beta_X * X[n]);
  }
} 

We run the models for each common garden and each genomic offset predictions/climatic transfer distances, so a total of 5 * 15 = 75 models runs.

Code
coefftab <- lapply(unique(survival_data$site),function(site_i){
  
  
  lapply(unique(df$method), function(method_i){
  
# Subset the data for the site i and method i
    sub_data <- survival_data %>% 
      filter(site == site_i) %>% 
      group_by(pop) %>% 
      dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) # transform survival data into mortality data
    
    sub_data <- df %>% 
      filter(method == method_i & cg == site_i) %>% 
      inner_join(sub_data, by="pop") %>% 
      inner_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))), by="pop") %>% 
      arrange(pop)
      
# Data in a list for Stan 
    stanlist <- list(N = nrow(sub_data),
                     nb_dead = sub_data$nb_dead,
                     nb_tot=sub_data$nb_tot,
                     H=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),
                     X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX))
    
# Running the model
    mod <- sampling(stancode, data = stanlist, iter = 2000, chains = 4, cores = 4, save_warmup = FALSE) 
    
    
    #loo(mod) %>% saveRDS(file=here(paste0("outputs/ValidationCommonGarden/MortalityModels/loos/loo_",site_i,"_",method_i,".rds")))
  
  
# Save coefficients
    broom.mixed::tidyMCMC(mod,
                  droppars = NULL, 
                  estimate.method = "median", 
                  ess = F, 
                  rhat = F, 
                  conf.int = T,
                  conf.level = 0.95) %>% 
    filter(str_detect(term, c('beta'))) %>% 
    dplyr::rename(median=estimate,
                  std_error=std.error,
                  conf_low=conf.low,
                  conf_high=conf.high) %>% 
    mutate(cg=site_i,
           method=method_i,
           .before=1)
    
  }) %>% bind_rows()
  
}) %>% bind_rows()

coefftab %>% saveRDS(file=here("outputs/ValidationCommonGarden/MortalityModels/coefftab.rds"))

2.3 Plot model coefficients

We load the model coefficients:

Code
coefftab <- readRDS(file=here("outputs/ValidationCommonGarden/MortalityModels/coefftab.rds"))
coefftab[1:5,] %>% kable_mydf()
cg method term median std_error conf_low conf_high
asturias CTD_bio1 beta_0 -2.98 0.07 -3.13 -2.85
asturias CTD_bio1 beta_H 0.15 0.07 0.01 0.28
asturias CTD_bio1 beta_X -0.02 0.07 -0.17 0.12
asturias CTD_bio12 beta_0 -2.98 0.07 -3.13 -2.84
asturias CTD_bio12 beta_H 0.14 0.08 -0.01 0.29

Below, we plot the 95% credible intervals of:

  • the \(\beta_X\) coefficients, which stand for the association between the counts of dead trees and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations when transplanted in the new environment of the common gardens) or the climatic tranfer distances.

  • the \(\beta_H\) coefficients, which stand for the association between the counts of dead trees and the BLUPs for height in the five common gardens, used as a proxy of the initial seedling height (a confounder in the model).

Code
coeff_match <- list(beta_H="$\\beta_{H}$ estimates (effect of initial tree height)",
                    beta_X="$\\beta_{X}$ estimates")
  
p <- lapply(c("beta_X","beta_H"), function(coeff){
  
p <- coefftab %>% 
  filter(term==coeff) %>% 
  left_join(distinct(df[,c("method_type","method_input","cg","method")]), by=c("method","cg")) %>% 
  mutate(cg=case_when(cg=="asturias"~"Asturias (37 months)",
                      cg=="bordeaux"~"Bordeaux (85 months)",
                      cg=="caceres"~"Cáceres (8 months)",
                      cg=="madrid"~"Madrid (13 months)",
                      cg=="portugal"~"Fundão (27 months)"),
         method_input=case_when(method_input=="all" ~ "All SNPs",
                                method_input=="bio1" ~ "Mean annual temperature (°C)",
                                method_input=="bio12" ~ "Annual precipitation (mm)",
                                method_input=="bio15" ~ "Precipitation seasonality (index)",
                                method_input=="bio3" ~ "Isothermality (index)",
                                method_input=="bio4" ~ "Temperature seasonality (°C)",
                                method_input=="cand" ~ "Candidate SNPs",
                                method_input=="control" ~ "Control SNPs",
                                method_input=="SHM" ~ "Summer heat moisture index (°C/mm)")) %>% 
  mutate(method_input=factor(method_input, levels=c("Mean annual temperature (°C)",
                                                    "Annual precipitation (mm)",
                                                    "Precipitation seasonality (index)",
                                                    "Isothermality (index)",
                                                    "Temperature seasonality (°C)",
                                                    "Summer heat moisture index (°C/mm)",
                                                    "All SNPs",
                                                    "Candidate SNPs",
                                                    "Control SNPs"))) %>% 
    ggplot(aes(x = method_type,
               y = median,
               ymin = conf_low, 
               ymax = conf_high,
               color=method_input,
               shape=method_input)) +
  geom_hline(yintercept = 0,color="gray") +
  geom_pointinterval(position = position_dodge(width = .4),point_size=3.5,size=3) + # 
  facet_wrap(~cg, ncol=2) + #, scales="free_x", space = "free" 
  ylab(TeX(coeff_match[[coeff]])) + xlab("") +
  scale_color_paletteer_d("ggthemes::calc") +
  scale_shape_manual(values = c(rep(16,6),rep(17,3))) +
  theme_bw() +
  labs(color="",shape="") +
  theme(axis.text.x =  element_text(size=13),
        axis.text.y = element_text(size=13),
        axis.title.y = element_text(size=16),
        axis.title.x = element_text(size=1),
        legend.title=element_text(size=13), 
        strip.text.x = element_text(size = 16),
        strip.background = element_blank(),
        legend.position = c(0.77,0.15),
        legend.text=element_text(size=12),
        panel.grid.minor.x=element_blank(),
        panel.grid.major.x=element_blank()) +
  guides(color=guide_legend(ncol=1),
         shape = guide_legend(override.aes = list(size =2 )))



# save in pdf and png
ggsave(p,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,".pdf")),
       device="pdf",
       height=9,
       width=12)

ggsave(p,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,".png")),
       height=9,
       width=12)

return(p)

})

p
[[1]]


[[2]]

Interpretation

As expected, seedlings that were higher at the planting date had a higher probability of survival in Fundão, Cáceres and Madrid, the three common gardens in which mortality rates were considerable. In Bordeaux and Asturias, mortality rates were very low, and there was no association with the initial seedling height, so we may consider that mortality events in these two common gardens were more probably random events than due to climatic or other environmental conditions. So, to evaluate the association between genomic offset predictions/climatic transfer distances and the counts of dead trees, we only interpret the results in Fundão, Cáceres and Madrid.

In Fundão, Cáceres and Madrid, for most genomic offset predictions, populations experiencing higher mortality rates were also those showing higher genomic offset. The most consistent associations across the three common gardens were obtained with:

  • Gradient Forest (GF) with both candidate and control SNPs.

  • Redundancy analysis (RDA) with both candidate and control SNPs.

  • Latent factor mixed model (LFMM) with all SNPs or control SNPs.

Interestingly, climatic transfer distances generally explain mortality probability less well than genomic offset predictions, and none of them show a consistent association with the number of dead trees across the three common gardens. Note that it is almost the case for the climatic transfer distance calculated based on the temperature seasonality, which shows a positive association with the counts of dead trees in Madrid and Fundão, and almost in Cáceres.

2.4 Experimental design

We export in a table with the number and proportion of dead trees for each population in each common garden.

Code
ExpDesignTab <- lapply(unique(survival_data$site),function(site_i){
  
survival_data %>% 
    filter(site==site_i) %>% 
    dplyr::select(pop,survival) %>% 
    drop_na() %>% 
    group_by(pop) %>% 
    dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) %>% 
    mutate(prop_dead=nb_dead*100/nb_tot)
  
}) %>% 
  setNames(unique(survival_data$site)) %>% 
  bind_rows(.id="cg") %>% 
  pivot_wider(names_from=cg,values_from = c(nb_dead, nb_tot,prop_dead),names_sep="_") %>% 
  dplyr::select(pop,contains("asturias"),contains("bordeaux"),contains("caceres"),contains("madrid"),contains("portugal"))

# Generate the latex table for the Supplementary Information
print(xtable(ExpDesignTab, type = "latex",digits=2), 
      file = paste0(here("tables/ExperimentalDesignTablesSurvivalCommonGarden.tex")), 
      include.rownames=FALSE)

ExpDesignTab %>% kable_mydf()
pop nb_dead_asturias nb_tot_asturias prop_dead_asturias nb_dead_bordeaux nb_tot_bordeaux prop_dead_bordeaux nb_dead_caceres nb_tot_caceres prop_dead_caceres nb_dead_madrid nb_tot_madrid prop_dead_madrid nb_dead_portugal nb_tot_portugal prop_dead_portugal
ALT 3 72 4.17 1 53 1.89 69 72 95.83 52 72 72.22 19 72 26.39
ARM 1 64 1.56 1 64 1.56 56 64 87.50 49 64 76.56 23 64 35.94
ARN 5 136 3.68 3 109 2.75 129 136 94.85 107 136 78.68 46 136 33.82
BAY 8 144 5.56 3 142 2.11 132 144 91.67 113 144 78.47 56 144 38.89
BON 2 71 2.82 1 48 2.08 62 72 86.11 48 72 66.67 14 72 19.44
CAD 5 80 6.25 3 70 4.29 74 80 92.50 62 80 77.50 22 80 27.50
CAR 0 48 0.00 0 46 0.00 44 48 91.67 36 48 75.00 21 48 43.75
CAS 4 80 5.00 6 78 7.69 78 80 97.50 66 80 82.50 31 80 38.75
CEN 2 72 2.78 1 38 2.63 66 72 91.67 50 72 69.44 18 72 25.00
COC 6 144 4.17 3 109 2.75 137 144 95.14 118 144 81.94 53 143 37.06
COM 0 32 0.00 3 32 9.38 28 32 87.50 21 32 65.62 10 32 31.25
CUE 9 224 4.02 9 194 4.64 211 224 94.20 186 224 83.04 101 224 45.09
HOU 11 208 5.29 6 160 3.75 186 208 89.42 145 208 69.71 66 208 31.73
LAM 3 72 4.17 4 60 6.67 68 72 94.44 57 72 79.17 27 72 37.50
LEI 14 184 7.61 10 143 6.99 162 184 88.04 147 184 79.89 64 184 34.78
MAD 0 8 0.00 0 8 0.00 8 8 100.00 5 8 62.50 2 8 25.00
MIM 7 144 4.86 4 126 3.17 135 144 93.75 116 144 80.56 67 144 46.53
OLB 3 176 1.70 4 116 3.45 149 176 84.66 113 176 64.20 27 176 15.34
OLO 22 191 11.52 3 132 2.27 173 192 90.10 145 192 75.52 73 192 38.02
ORI 4 208 1.92 2 183 1.09 196 208 94.23 162 208 77.88 67 208 32.21
PET 11 192 5.73 3 147 2.04 171 192 89.06 136 192 70.83 75 192 39.06
PIA 8 128 6.25 1 110 0.91 111 128 86.72 91 128 71.09 43 128 33.59
PIE 3 72 4.17 4 55 7.27 67 72 93.06 50 72 69.44 17 72 23.61
PLE 8 160 5.00 9 138 6.52 155 160 96.88 131 160 81.88 71 160 44.38
PUE 1 64 1.56 5 56 8.93 58 64 90.62 50 64 78.12 30 64 46.88
QUA 5 136 3.68 3 117 2.56 123 136 90.44 83 136 61.03 36 136 26.47
SAC 1 72 1.39 2 51 3.92 65 72 90.28 56 72 77.78 35 72 48.61
SAL 7 112 6.25 0 69 0.00 104 112 92.86 85 112 75.89 55 112 49.11
SEG 12 168 7.14 7 168 4.17 154 168 91.67 135 168 80.36 77 168 45.83
SIE 2 64 3.12 1 47 2.13 59 64 92.19 40 64 62.50 29 64 45.31
STJ 15 224 6.70 3 180 1.67 190 224 84.82 154 224 68.75 70 224 31.25
TAM 8 120 6.67 4 71 5.63 114 120 95.00 107 120 89.17 60 120 50.00
VAL 5 96 5.21 5 64 7.81 87 96 90.62 69 96 71.88 29 96 30.21
VER 11 216 5.09 5 160 3.12 197 216 91.20 153 216 70.83 83 216 38.43
Code
# Information used in the manuscript
ExpDesignTab[,-1] %>% 
  dplyr::summarise_all(mean) %>% 
  kable_mydf
nb_dead_asturias nb_tot_asturias prop_dead_asturias nb_dead_bordeaux nb_tot_bordeaux prop_dead_bordeaux nb_dead_caceres nb_tot_caceres prop_dead_caceres nb_dead_madrid nb_tot_madrid prop_dead_madrid nb_dead_portugal nb_tot_portugal prop_dead_portugal
6.06 123 4.27 3.5 98.35 3.7 112.29 123.06 91.65 92.29 123.06 74.31 44.62 123.03 35.79

3 Height models

3.1 No confounder

3.1.1 Mathematical model

In this section, we want to determine whether genomic offset (GO) predictions or climate transfer distances (CTD) are associated with tree height, independently in the five common gardens. In each common garden, we perform the following model:

\[\begin{align*} Y_{ipb} &\sim \mathcal{N}(\mu_{p},\sigma^{2}_{r}) \\ \mu_{p} &= B_b + \beta_{X1}X_{p} + \beta_{X2}X^{2}_{p}\\ \end{align*}\]

with \(Y_ipb\) is the height of the individual \(i\) in the population \(p\) in the block \(b\) and \(X_{p}\) is the value of the variable of interest (GO or CTD) in the population \(p\). We include a quadratic term for \(X_p\) to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).

Code
stancode = stan_model(here("scripts/StanModels/ValidationCommonGarden_GaussianModelHeight.stan"))
print(stancode)
S4 class stanmodel 'anon_model' coded as follows:
data {
  int N;                                    // Number of individuals
  vector[N] Y;                              // Response variable (individual tree height)
  vector[N] X;                              // Genomic offset or climatic transfer distances
  int<lower=0> nb_bloc;                     // Number of blocks
  int<lower=0, upper=nb_bloc> bloc[N];      // Blocks
}
parameters {
  vector[nb_bloc] alpha_bloc;               // intercepts of blocks
  real beta_X1;
  real beta_X2;
  real<lower = 0>  sigma_r;
}
transformed parameters {
  vector[N] mu;    // linear predictor
  real R_squared;  // R^2 to evaluate the goodness of fit of the model

  mu = alpha_bloc[bloc] + beta_X1 * X + beta_X2 * square(X);
  R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {

  Y ~ normal(mu, sigma_r); // Likelihood

  sigma_r ~ exponential(1);
  alpha_bloc ~ std_normal();
  beta_X1 ~ std_normal();
  beta_X2 ~ std_normal();
}
generated quantities{
  // log likelihood for loo
  vector[N] log_lik;
  vector[N] muhat;
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma_r);
    muhat[n] = normal_rng(mu[n], sigma_r);
  }
} 
Code
height_measurements <- c("AST_htmar14","BDX_htnov18","CAC_htdec11","MAD_htdec11","POR_htmay13")

height_data <- pheno_data %>% 
  dplyr::rename(cg = site) %>% 
  dplyr::select(cg,block,pop,clon,tree,any_of(height_measurements)) %>% 
  pivot_longer(cols=any_of(height_measurements), names_to=NULL,values_to="height",values_drop_na = TRUE)

height_data[1:5,] %>% kable_mydf
cg block pop clon tree height
asturias 2 CAR CAR12 CAR12_2 840
asturias 4 STJ STJ14 STJ14_4 1200
asturias 5 CUE CUE6 CUE6_5 1160
asturias 6 MAD MAD1 MAD1_6 1340
asturias 4 ORI ORI12 ORI12_4 1320
Code
coefftab <- lapply(unique(height_data$cg), function(site_i){
  
  lapply(unique(df$method), function(method_i){
  
    df_sub <- df %>% filter(method == method_i & cg == site_i)
    
    sub_data <- height_data %>% 
      filter(cg == site_i) %>% 
      left_join(df_sub, by = c("cg","pop"))
      
    stanlist <- list(N = nrow(sub_data),
                     Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),
                     X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),
                     nb_bloc = length(unique(sub_data$block)),
                     bloc = as.numeric(as.factor(sub_data$block)))

    mod <- sampling(stancode, data = stanlist, iter = 2000, chains = 4, cores = 4, save_warmup = FALSE) 
    
    # 
    # loo(mod) %>% 
    #   saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/loos/loo_",
    #                            site_i,"_",method_i,".rds")))
  
  
# Save coefficients
    broom.mixed::tidyMCMC(mod,
                          pars=c("beta_X1","beta_X2","R_squared","sigma_r","alpha_bloc"),
                          droppars = NULL, 
                          estimate.method = "median", 
                          ess = F, 
                          rhat = F, 
                          conf.int = T,
                          conf.level = 0.95) %>% 
    dplyr::rename(median=estimate,
                  std_error=std.error,
                  conf_low=conf.low,
                  conf_high=conf.high) %>% 
    mutate(cg=site_i,
           method=method_i,
           .before=1)
    
    
  }) %>% bind_rows()
  
}) %>% bind_rows()

coefftab %>% saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_withoutconfounder.rds"))

3.1.2 Plot model coefficients

Below, we plot the 95% credible intervals of the \(\beta_{X_1}\) and \(\beta_{X_2}\) coefficients, which stand for the association between tree height and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations when transplanted in the new environment of the common gardens) or the climatic tranfer distances.

Code
generate_interval_plots <- function(confounder="withoutconfounder"){

coefftab <- readRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/coefftab_",confounder,".rds")))
  
coeff_match <- list(beta_X1="$\\beta_{X_1}$ estimates",
                    beta_X2="$\\beta_{X_2}$ estimates")
  
p <- lapply(c("beta_X1","beta_X2"), function(coeff){
  
p <- coefftab %>% 
  filter(term==coeff) %>% 
  left_join(distinct(df[,c("method_type","method_input","cg","method")]), by=c("method","cg")) %>% 
  mutate(cg=case_when(cg=="asturias"~"Asturias (37 months)",
                      cg=="bordeaux"~"Bordeaux (85 months)",
                      cg=="caceres"~"Cáceres (8 months)",
                      cg=="madrid"~"Madrid (13 months)",
                      cg=="portugal"~"Fundão (27 months)"),
         method_input=case_when(method_input=="all" ~ "All SNPs",
                                method_input=="bio1" ~ "Mean annual temperature (°C)",
                                method_input=="bio12" ~ "Annual precipitation (mm)",
                                method_input=="bio15" ~ "Precipitation seasonality (index)",
                                method_input=="bio3" ~ "Isothermality (index)",
                                method_input=="bio4" ~ "Temperature seasonality (°C)",
                                method_input=="cand" ~ "Candidate SNPs",
                                method_input=="control" ~ "Control SNPs",
                                method_input=="SHM" ~ "Summer heat moisture index (°C/mm)")) %>% 
  mutate(method_input=factor(method_input, levels=c("Mean annual temperature (°C)",
                                                    "Annual precipitation (mm)",
                                                    "Precipitation seasonality (index)",
                                                    "Isothermality (index)",
                                                    "Temperature seasonality (°C)",
                                                    "Summer heat moisture index (°C/mm)",
                                                    "All SNPs",
                                                    "Candidate SNPs",
                                                    "Control SNPs"))) %>% 
    ggplot(aes(x = method_type,
               y = median,
               ymin = conf_low, 
               ymax = conf_high,
               color=method_input,
               shape=method_input)) +
  geom_hline(yintercept = 0,color="gray") +
  geom_pointinterval(position = position_dodge(width = .4),point_size=3.5,size=3) + # 
  facet_wrap(~cg, ncol=2) + #, scales="free_x", space = "free" 
  ylab(TeX(coeff_match[[coeff]])) + xlab("") +
  scale_color_paletteer_d("ggthemes::calc") +
  scale_shape_manual(values = c(rep(16,6),rep(17,3))) +
  theme_bw() +
  labs(color="",shape="") +
  theme(axis.text.x =  element_text(size=13),
        axis.text.y = element_text(size=13),
        axis.title.y = element_text(size=16),
        axis.title.x = element_text(size=1),
        legend.title=element_text(size=13), 
        strip.text.x = element_text(size = 16),
        strip.background = element_blank(),
        legend.position = c(0.77,0.15),
        legend.text=element_text(size=12),
        panel.grid.minor.x=element_blank(),
        panel.grid.major.x=element_blank()) +
  guides(color=guide_legend(ncol=1),
         shape = guide_legend(override.aes = list(size =2 )))



# save in pdf and png
ggsave(p,file=here(paste0("figs/ValidationCommonGarden/HeightModels/",coeff,"_",confounder,".pdf")),
       device="pdf",
       height=9,
       width=12)

ggsave(p,file=here(paste0("figs/ValidationCommonGarden/HeightModels/",coeff,"_",confounder,".png")),
       height=9,
       width=12)

return(p)

})

return(p)
}

generate_interval_plots()
[[1]]


[[2]]

Code
generate_poly_plots <- function(confounder="withoutconfounder"){

  
coefftab <- readRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/coefftab_",confounder,".rds")))

x <- seq(-2,2,0.01)
poly_function <- function(a,b,x) {a * x^2 + b * x}

ggtab <- coefftab %>% 
  filter(term %in% c("beta_X1","beta_X2")) %>% 
  left_join(distinct(df[,c("method_type","method_input","cg","method")]), by=c("method","cg")) %>% 
  mutate(cg=case_when(cg=="asturias"~"Asturias (37 months)",
                      cg=="bordeaux"~"Bordeaux (85 months)",
                      cg=="caceres"~"Cáceres (8 months)",
                      cg=="madrid"~"Madrid (13 months)",
                      cg=="portugal"~"Fundão (27 months)"),
         method_input=case_when(method_input=="all" ~ "All SNPs",
                                method_input=="bio1" ~ "Mean annual temperature (°C)",
                                method_input=="bio12" ~ "Annual precipitation (mm)",
                                method_input=="bio15" ~ "Precipitation seasonality (index)",
                                method_input=="bio3" ~ "Isothermality (index)",
                                method_input=="bio4" ~ "Temperature seasonality (°C)",
                                method_input=="cand" ~ "Candidate SNPs",
                                method_input=="control" ~ "Control SNPs",
                                method_input=="SHM" ~ "Summer heat moisture index (°C/mm)")) %>% 
  pivot_wider(names_from="term", values_from = c("median","std_error","conf_low","conf_high"))



ggtab <- lapply(unique(ggtab$cg), function(cg_i){
lapply(unique(ggtab$method), function(method_i){
  
  subggtab <- ggtab %>% filter(method == method_i & cg == cg_i)
  
  poly_function(a=subggtab$median_beta_X2, b=subggtab$median_beta_X1,x=x)
  }) %>% 
    setNames(unique(ggtab$method)) %>% 
    bind_rows(.id="method") %>% 
    mutate(x_axis = x)
}) %>% 
  setNames(unique(ggtab$cg)) %>% 
  bind_rows(.id="cg") %>% 
  pivot_longer(cols=any_of(unique(ggtab$method)),names_to="method",values_to="predictions") %>% 
  left_join(ggtab[,c("method","method_type", "method_input")] %>% distinct(), by="method")
  

ggplots <- lapply(unique(ggtab$cg), function(cg_i){
  
p_go <- ggtab %>%
  filter(cg %in% cg_i & method_type %in% unique(ggtab$method_type)[!unique(ggtab$method_type) == "CTD"]) %>% 
  mutate(method_input = factor(method_input, levels=c("Candidate SNPs", "Control SNPs", "All SNPs"))) %>% 
  ggplot(aes(x=x_axis,y=predictions,col=method_type,linetype=method_input)) +
  geom_hline(yintercept=0,color="black") +
  geom_line(linewidth=1) +
  ylab(paste0('Standardized height - ',cg_i)) + 
  xlab(TeX("Standardized $X$")) +
  ylim(c(min(ggtab$predictions),max(ggtab$predictions))) +
  scale_color_manual(values=c("#FCA315FF", "#1BB6AFFF","#172869FF", "#D9565CFF"), name = "GEA method") +
  scale_linetype_manual(values=c("solid","dashed","dotted"), name = "SNP set") +
  theme_bw() + 
  theme(axis.text.x = element_text(size=13),
        axis.text.y = element_text(size=13),
        axis.title = element_text(size=16),
        legend.box="horizontal",
        legend.key.width = unit(1,"cm"),
        legend.background = element_blank(),
        legend.box.background = element_blank(),
        legend.key = element_blank(),
        legend.position = c(0.75,0.91)) + 
  guides(color=guide_legend(ncol=1))

#if(!cg_i==unique(ggtab$cg)[1])  p_go <- p_go + theme(legend.position = "none")
#if(!cg_i==unique(ggtab$cg)[length(unique(ggtab$cg))])  p_go <- p_go + xlab("")

p_ctd <- ggtab %>% 
  filter(cg %in% cg_i & method_type %in% "CTD") %>%
  ggplot(aes(x=x_axis,y=predictions,col=method_input)) +
  geom_hline(yintercept=0,color="black") +
  #geom_segment(aes(x = min(x_axis), y = 0, xend = max(x_axis), yend = 0), color="black") +
  geom_line(linewidth=1) +
  ylab('') + 
  xlab(TeX("Standardized $X$")) +
  ylim(c(min(ggtab$predictions),max(ggtab$predictions))) +
  scale_color_manual(values=c("#009392FF", "#66E5FFFF", "#9CCB86FF", "#FFC34CFF", "#ED3F39FF", "#FF8000FF"), 
                     name = "Climatic transfer distance") +
    theme_bw() + 
    theme(axis.text.x = element_text(size=13),
          axis.text.y = element_text(size=13),
          axis.title = element_text(size=16),
          legend.box="horizontal",
          legend.background = element_blank(),
          legend.box.background = element_blank(),
          legend.key = element_blank(),
          legend.position = c(0.55,0.92)) + 
  guides(color=guide_legend(ncol=2))

#if(!cg_i==unique(ggtab$cg)[1])  p_ctd <- p_ctd + theme(legend.position = "none")
#if(!cg_i==unique(ggtab$cg)[length(unique(ggtab$cg))])  p_ctd <- p_ctd + xlab("")

plot_grid(p_go,p_ctd)})

pdf(here(paste0("figs/ValidationCommonGarden/HeightModels/ScatterPlotsPredictedRelationships_",confounder,".pdf")), width=14,height=8)
lapply(ggplots, function(x) x)
dev.off()

return(ggplots)
}

generate_poly_plots()
[[1]]


[[2]]


[[3]]


[[4]]


[[5]]

Interpretation

Higher genomic offset predictions are consistently associated with lower tree height only in Asturias. Most genomic offset predictions are also associated with lower tree height in Bordeaux.

In Fundão, Cáceres and Madrid, the association between genomic offset predictions and tree height are often in the opposite direction as expected.

Interestingly, the climatic transfer distances based in the mean annual temperature and the annual precipitation are negatively associated with tree height in all common gardens (except Cáceres for the CTD based on annual precipitation). These CTD may therefore be better predictors of tree height in common gardens than the genomic offset predictions.

Finally, I think that using tree height as a proxy for fitness to evaluate genomic offset predictions in common gardens may not be appropriate in maritime pine because this trait has a very low genetic-by-environment interaction (see previous papers). Therefore, the association between genomic offset predictions and tree height in Asturias and Bordeaux may not be due to a higher fitness of the trees that are best adapted to the climate in these common gardens, but more probably to height differences that are common across all environments (i.e. populations from Atlantic climates are generally taller). This is confirmed by the models below in which the population-specific BLUPs for height across all common gardens was included as a confounder. When included, the association between genomic offset predictions and tree height in Asturias almost disappears and either disappears or goes in the opposite direction as expected in Bordeaux. Except the genomic offset predictions based on the RDA remain negatively associated with tree height in Bordeaux and Asturias.

Therefore, I am not sure we can go very far in this validation step. We may retain that genomic offset predictions seem to show the most consistent association with tree height when generated with the RDA, and that this association is robust (though smaller) even when the fixed genetic differences across populations are included as confounder.

3.2 Initial height as a counfounder

\[\begin{align*} Y_{ipb} &\sim \mathcal{N}(\mu_{p},\sigma^{2}_{r}) \\ \mu_{p} &= B_b + \beta_{X1}X_{p} + \beta_{X2}X^{2}_{p} + \beta_H H_p\\ \end{align*}\]

with \(H_p\) is a proxy of the initial height of the populations when trees were planted. More precisely, it is the BLUPs for height of the population \(p\) (population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. 2022).

Code
stancode = stan_model(here("scripts/StanModels/ValidationCommonGarden_GaussianModelHeight_WithHeightConfounder.stan"))
print(stancode)
S4 class stanmodel 'anon_model' coded as follows:
data {
  int N;
  vector[N] Y;                              // Response variable (individual height)
  vector[N] X;                              // Genomic offset or climatic transfer distances
  vector[N] H;                              // Initial height of the populations
  int<lower=0> nb_bloc;                     // Number of blocks
  int<lower=0, upper=nb_bloc> bloc[N];      // Blocks
}
parameters {
  vector[nb_bloc] alpha_bloc;
  real beta_X1;
  real beta_X2;
  real beta_H;
  real<lower = 0>  sigma_r;
}
transformed parameters {
  vector[N] mu;    // linear predictor
  real R_squared;  // R^2 to evaluate the goodness of fit of the model

  mu = alpha_bloc[bloc] + beta_X1 * X + beta_X2 * square(X) + beta_H * H;
  R_squared = 1 - variance(H - mu) / variance(H);
}
model {

  Y ~ normal(mu, sigma_r); // Likelihood

  sigma_r ~ exponential(1);
  alpha_bloc ~ std_normal();
  beta_H ~ std_normal();
  beta_X1 ~ std_normal();
  beta_X2 ~ std_normal();
}
generated quantities{
  // log likelihood for loo
  vector[N] log_lik;
  vector[N] muhat;
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma_r);
    muhat[n] = normal_rng(mu[n], sigma_r);
  }
} 
Code
coefftab <- lapply(unique(height_data$cg), function(site_i){
  
  lapply(unique(df$method), function(method_i){
  
    df_sub <- df %>% filter(method == method_i & cg == site_i)
    
    sub_data <- height_data %>% 
      filter(cg == site_i) %>% 
      left_join(df_sub, by = c("cg","pop")) %>% 
      
      left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")
      
    stanlist <- list(N = nrow(sub_data),
                     Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),
                     X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),
                     H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),
                     nb_bloc = length(unique(sub_data$block)),
                     bloc = as.numeric(as.factor(sub_data$block)))

    mod <- sampling(stancode, data = stanlist, iter = 2000, chains = 4, cores = 4, save_warmup = FALSE) 
    
    # 
    # loo(mod) %>% 
    #   saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/loos/loo_",
    #                            site_i,"_",method_i,".rds")))
  
  
# Save coefficients
    broom.mixed::tidyMCMC(mod,
                          pars=c("beta_X1","beta_X2","beta_H","R_squared","sigma_r","alpha_bloc"),
                          droppars = NULL, 
                          estimate.method = "median", 
                          ess = F, 
                          rhat = F, 
                          conf.int = T,
                          conf.level = 0.95) %>% 
    dplyr::rename(median=estimate,
                  std_error=std.error,
                  conf_low=conf.low,
                  conf_high=conf.high) %>% 
    mutate(cg=site_i,
           method=method_i,
           .before=1)
    
    
  }) %>% bind_rows()
  
}) %>% bind_rows()

coefftab %>% saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_withconfounder.rds"))
Code
generate_interval_plots(confounder="withconfounder")
[[1]]


[[2]]

Code
generate_poly_plots(confounder="withconfounder")
[[1]]


[[2]]


[[3]]


[[4]]


[[5]]

References

Archambeau, Juliette, Marta Benito Garzón, Frédéric Barraquand, Marina de Miguel, Christophe Plomion, and Santiago C González-Martı́nez. 2022. “Combining Climatic and Genomic Data Improves Range-Wide Tree Height Growth Prediction in a Forest Tree.” The American Naturalist 200 (4): E141–59.
Fitzpatrick, Matthew C, Vikram E Chhatre, Raju Y Soolanayakanahally, and Stephen R Keller. 2021. “Experimental Support for Genomic Prediction of Climate Maladaptation Using the Machine Learning Approach Gradient Forests.” Molecular Ecology Resources.